This is a research project that discovers ways to manipulate sound files in wav. format with python packages from anaconda. I did this project because I like music and playing with computer, and I'd like to try to play with music using some coding techniques; also, this project contains basic concepts of sound processing, which can lead to further applications such as audio editing and musical industry, like garageband.
Before we get started, It's important to know what sound is. Sound is essentially a type of signal caused by vibrations that can travel through mediums, and such vibration can be interpreted to be a wave. There are two major elements in particular by which a wave can be examined, the amplitude and the frequency.
[Harman,Rosie et al, 2019]
The amplitude is the maximum displacement reached by a wave. In another word, it's the intensity of the vibration. The amplitude of a sound wave determines the loudness of the sound, a greater amplitude means the sound to be louder. For example, the average city traffic has a amplitude of 80 decibels,in contrast, in a quite library it's only about 30 decibels. A person's hearing might start getting perminantly damaged by a noise over 85 decibels for a prolonged period of time[1].
[Karen Hao&Youyou Zhou et al, 2018]: the average city traffic has a amplitude of 80 decibels
[Damen et al, 2019]:a quite library has a amplitude of only about 30 decibels
The frequency of a wave refers to the number of cycles occurs in a specific period of time, typically one second. Otherword speaking, it's how fast the vibration happens. The frequency determines the pitch of the sound; a higher frequency makes the sound higher. Generally, different musicial instruments have different ranges of frequency. The frequency of a flute can range from about 300 Hz to about 2500 Hz, and that of a violin can range from 200 Hz to more than 3500 Hz. A human being has the capability to hear sounds within the frequency of from around 20 Hz up to about 20000 Hz, but out hearing remains most sensitive in the 2000-5000 Hz frequency range[2].
[StringOvationTeam et al, 2019]
Therefore, by changing the amplitude or the frequency of a sound file, we can easily modify its volume and tone.
The environment this project follows under is Python 3, and we use Anaconda for environment management.
For installing Python 3 environment, go to this link: https://www.python.org/downloads/
Anaconda is a data platform that can help us access to and manage plenty of useful packages and environments. It also contains a very helpful editor: Jupyter Notebook, which will be used in this project to write the codes.
For downloading Anaconda, check this link: https://docs.anaconda.com/anaconda/install/
In this project, I will show a couple of different ways to make changes to a sound file, including cutting the audio, intensity manipulation, changing frequency and removing sound within certain frequency range, combining two audios and writting the edited sound as an output.
The first step is to import the libraries. The packages we use are numpy, matplotlib, scipy and IPython. These can be downloaded from anaconda.
import numpy as np
import matplotlib.pyplot as plt
from scipy.io.wavfile import read,write
from IPython.display import Audio
from numpy.fft import fft, ifft
%matplotlib inline
numpy can help us do many advanced computing, data analyzing and calculations. You can download numpy and get more information at this site: https://numpy.org/. It can also be installed from the anaconda navigator.
matplotlib is useful for creating visualations. In this project it works by converting the sound file into graph. For more information, visit this site: https://matplotlib.org/.
IPython is able to play the audio. Check this link: https://ipython.org/.
scipy.io.wavfile allows us to read a sound file within WAV. format. The WAV file is one of the simplest digital file formats, and it works by converting a sound signal into binary data. We can see what the data is like by printing its "shape", in this case, it has two rows with 7431872 digits for each, in another word, it's a 7431872*2 array. The two rows are essentially two channels (left and right) within the sound file. The sampling frequency, Fs for this sound file is 48000, means there are 48000 digits being processed per second. Note that the sampling frequency is not the same as the frequency of sound.
Fs, data= read('coffindance.wav')
print(data.shape)
print(Fs)
Using matplotlib, we can interpret sound by a graph.
plt.figure()
plt.plot(np.arange(len(data))/Fs,data)
plt.xlabel("time(sec)")
plt.ylabel("amplitude")
plt.title("waveform of coffindance.wav")
plt.show()
In the graph above, the x-axis is made to be the total digits divide by the sampling frequency, which gives time. The y-axis represents the amplitude, or the magnitude of each digit. The two channels are indicated by two different waveforms in different colors.
We can cut the sound by redefining "data". Because sound in a WAV file is interpreted as number sequences, by removing a number of digits from the number sequence we can delete part of the sound. In the code below, data=data[:,1] redefines data as all the digits within channel 1, thus only keeps channel 1. If we print data there it shows (7431872,), indicates that data turns to be a one-dimension number sequence instead of a two-dimensional matrix, only a single channel is left.
data=np.array(data)
data=data[:,1]
print(data.shape)
use IPython.display to play the audio
#Audio(data,rate=Fs)
#download.wav
We can further remove numbers from data to reduce its length. The code data=data[0:540000] modifies data once again, and it keeps the first 5400000 digits among all 7431872 digits, so data.shape becomes (5400000,). When we try to play the audio again, it goes noticeably shorter.
data=data[0:5400000]
print(data.shape)
#Audio(data,rate=Fs)
#download (1).wav
Looking back into the graph, the orange waveform(channel 0) is gone, and the length of the wave has been cut from around 7000000 digits to around 5000000 digits, so the time becomes shorter.
plt.figure()
plt.plot(np.arange(len(data))/Fs,data)
plt.xlabel("time(sec)")
plt.ylabel("amplitude")
plt.title("waveform of coffindance.wav")
plt.show()
Remind that the magnitude of the digits determines the intensity of the sound. Therefore, the intensity can be manipulated by changing values of numbers within the number sequence. The easiest way is to just multiply the data with a ratio. For example, if we want to make the first 3700000 numbers increase by half, we can just do the following code: data[0:3700000]=data[0:3700000]*1.5. It replaces the part of data from digit 0 to digit 3700000 by 1.5 times the original values, making the specific section of the audio become louder when we play it.
data[0:3700000]=data[0:3700000]*1.5
#Audio(data[0:3700000], rate=Fs)
#download (2).wav
It's also possible to make a gradual change in volume. One way to achieve it is to use a loop to help us modify many small pieces of sound. In the code below, we created a slow decrease of amplitude from digit 3700000 to digit 5400000, thus gradually muted the music.
a=3700000
b=3701000
c=1
for i in range(0,1699):
a=a+1000
b=b+1000
c=c+0.01
data[a:b]=data[a:b]/c
#Audio(data[3700000:5400000], rate=Fs)
#download (3).wav
We can clearly see the changes from the graph of the new modified music. Its amplitude changes just as we expected.
plt.figure()
plt.plot(np.arange(len(data[3700000:5400000]))/Fs,data[3700000:5400000])
plt.xlabel("time(sec)")
plt.ylabel("amplitude")
plt.title("waveform of coffindance.wav")
plt.show()
The frequency of the sound determines the pitch. By manipulating the frequency, we can change the music itself. Because the first sound file has been modified, we input another file here, and keep only one channel for simplicity.
fS,DATA= read('xuehuapiaopiao.wav')
print("sampling frequency#2 is",fS)
DATA=DATA[:,0]
print(DATA.shape)
#Audio(DATA,rate=fS)
#download (4).wav
The first way to manipulate the frequency of the sound is to change the sampling frequency. The sampling frequency, again is the rate by which digits in a WAV file are processed. The following code multiplies the sampling frequency by 1.5.
fS= fS*1.5
#Audio(DATA,rate=fS)
#download (5).wav
Because the sampling frequency is a specific number, we can also just define it as a random integer.
fS=47000
Now it comes the other method to change to frequency. This method involves a concept called the fourior transform. What is the fourior transform? In mathematics, conceptionally any wave whose amplitude is not zero can be seen as a combination of multiple(or infinite) periodic waves (like sin functions& cos functions). https://en.wikipedia.org/wiki/File:Fourier_series_square_wave_circles_animation.gif this link shows how it looks when multiple periodic waves are forming a more complicated wave.
Therefore, technically a wave can also be decomposed into these periodic waves from which it was built. When that happens, these waves can be arranged by their frequencies, as the following image shows:
[Vink, Ritchie et al,2017]
Thus we can get a new function of all these waves from another dimention, with their frequencies as the x-axis and amplitudes as the y-axis.
By using numpy, it allows us to decompose a complicated wave into the multiple waves with different frequencies. The following code defines DATA_F as the decomposed DATA, and it's printed as a number of waves, arranged from low frequency to high frequency.
print(DATA.shape)
fft(DATA)
DATA_F=fft(DATA)
print(DATA_F)
print(DATA_F.shape)
we can then remove some of the waves, and therefore keep only waves within certain frequency. In the code below, DATA_F[20000:,]=0,DATA_F[:10000,]=0,keeps waves within range from 10000 to 20000 (It's not frequency but related to the frequency linearly with some coefficient); and DATA_processed=ifft(DATA_F) converts the waves back into the original form.
DATA_F[20000:,]=0
DATA_F[:10000,]=0
print(DATA_F)
DATA_processed=ifft(DATA_F)
DATA_processed=np.real(DATA_processed)
And the graph gets quite different as plotted. Any sound out of the range has been removed.
plt.figure()
plt.plot (np.arange(len(DATA_processed[0:5000000]))/Fs,DATA_processed[0:5000000])
plt.xlabel("time(sec)")
plt.ylabel("amplitude#2")
plt.title('waveform of test.wav')
plt.show
Actually, you can hardly hear any sound now, because we kept frequency within a really tiny range.
#Audio(DATA_processed,rate=Fs)
#download (6).wav
We can try to redo it, and remove the high pitched flute in the beginning. (we have to redefine DATA_F to make it restored to the original)
DATA_F=fft(DATA)
DATA_F[190000:,]=0
DATA_processed=ifft(DATA_F)
DATA_processed=np.real(DATA_processed)
#Audio(DATA_processed,rate=Fs)
#download (7).wav
In order to sum up two sounds, we can simply add "data" together; but they must have the same length. Here we are going to add some metronome to the sample music clip. First input the audios and cut them into the same length.
FS, DAtA=read('bensound-anewbeginning.wav')
print(FS)
fs, datA=read('Record.wav')
print(datA.shape)
DAtA=DAtA[0:710000,0]
datA=datA[0:710000,0]
#Audio(DAtA, rate=FS)
#download (8).wav
#Audio(datA, rate=FS)
##download (9).wav
We add up the two sounds and use another variable data_a to define it. The new sound turns out to be a combination of those two.
data_a=DAtA+datA
#Audio(data_a, rate=FS)
#download (10).wav
All the steps above introduce several methods to bring changes to a sound file: we manipulate the amplitude and frequency to cut, combine, amplify, mute, extend the sound, and even be able to remove certain sound of specific frequencies. By using these techniques properly, we can perfect music and turn it into any form we would like.
To write the edited audio as a output:
write('output.wav',FS,data_a)
Blythe, Sally Gooddard et al. Attention, Balance and Coordination: The A.B.C. of Learning Success, pp.377-378. https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119164746.app2.
Clason, Debbie. "What is a decibel?" https://www.healthyhearing.com/report/52514-What-is-a-decibel. June 28,2018. Accessed July 1, 2020.
Music: a new beginning from www.bensound.com.